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Abstract 



We determine stability and attractor properties of random Boolean genetic network 
models with canalyzing rules for a variety of architectures. For all power law and 
exponential as well as flat in-degree distributions, we find that the networks are 
dynamically stable. Furthermore, for architectures with few inputs per node, the 
dynamics of the networks is close to critical. In addition, the fraction of genes that 
are active decreases with the number of inputs per node. These results are based 
upon investigating ensembles of networks using analytical methods. Also, for different 
in-degree distributions, the numbers of fixed points and cycles are calculated, with 
results intuitively consistent with the stability analysis; fewer inputs per node implies 
more cycles, and vice versa. There are hints that genetic networks acquire broader 
degree distributions with evolution and hence our results then indicate that for single 
cells, the dynamics should become more stable with evolution. However, such an 
effect is very likely compensated for by multicellular dynamics, since one expects less 
stability when interactions between cells are included. We verify this by simulations 
of a simple model for interactions between cells. 



2 



1 Introduction 



With the advent of high-throughput genomic measurement methods, it is soon within 
reach to apply reverse engineering techniques and map out genetic networks inside 
cells. These networks should perform a task, and, very importantly, be stable from a 
dynamical point of view. It is therefore of utmost interest to investigate the generic 
properties of networks models, such as architecture, dynamic stability and degree of 
activation as functions of system size. Random Boolean networks have for several 
decades received much attention in these contexts. These networks consist of nodes, 
representing genes and proteins, connected by directed edges, representing gene regu- 
lation. The number of inputs to and outputs from each node, the in- and out-degrees, 
are drawn from some distribution. 

It has been shown that with a fixed number, K, of inputs per node such network 
models exhibit some interesting properties [1]. Specifically, for K = 2 the dynamics is 
critical, i.e., right between stable and chaotic. Furthermore, one might interpret the 
solutions, fixed points and cycles, as different cell types. Being critical is considered 
advantageous since it should promote evolution. These results were obtained with no 
constraints on the architectures and assumed a flat distribution of the Boolean rules. 

It appears natural to revisit the study of Boolean network ensembles, given recent 
experimental hints on network architectures and rule distributions. For transcrip- 
tional networks, there are indications from extracted gene-gene networks that, for 
both E. coli [2] and yeast, [3] the out-degree distribution is of power law nature. The 
in-degree distribution appears to be exponential in E. coli, whereas it may equally 
well be a power law in yeast. In [4] stability properties of random Boolean networks 
were probed with power law in-degree distributions, and regions of robustness were 
identified. 

The distribution of rules is likely to be structured and not random. Indeed, in a 
recent compilation [5] (see also [6]), all rules are canalyzing [1]; a canalyzing Boolean 
function has at least one input, such that for at least one input value, the output 
value is fixed. It is not straightforward to generate biologically relevant canalyzing 
functions. In [6] the notion of nested canalyzing functions was introduced, which 
facilitates generation of rule distributions. 
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We find that networks with nested canalyzing rules are stable, for all power law 
and exponential in-degree distributions. Furthermore, as the degree distribution gets 
flatter, one moves further away from criticality. Also, the average number of active 
genes (fraction of genes that take the value 'true') is predicted for different powers. 

There are experimental hints that the in-degree distributions get flatter with genome 
size. This could be understood intuitively, since higher organisms in general have 
acquired more complexity in terms of redundancy in signal integration. In such a 
picture, our robustness analysis indicates that with multicellular species one would 
move away from critical dynamics, thereby making evolution less accessible. How- 
ever, the picture is complicated by the presence of extracellular interactions. We 
model these with a simple scheme allowing for 5-10% extracellular traffic, and, not 
unexpectedly, the system, though still robust, moves towards criticality. 



2 Methods and Models 



2.1 Degree Distributions 

Our results turn out to be qualitatively equivalent for power law, exponential and 
flat in-degree distributions. (By flat we mean a uniform distribution for up to K max 
inputs.) In what follows, we choose to illustrate the results with a power law (often 
denoted scale-free) distribution, with a cutoff in the number of inputs, K, 

i — if 1< K < N 
p% = P(#inputs = K) oc lK~< . (l) 

y otherwise 

where N is the number of nodes. In yeast protein-protein networks [7], and also 
in other applications, e.g., the Internet and social networks, 7 appears to lie in the 
range 2 to 3. In our calculations, we explore the region to 5, varying N from 20 to 
infinity. The connectivity of gene-gene networks extracted from yeast [3] appear to 
behave like in Eq. Q for the in-degree distributions, with an exponent 7 in the range 
1.5-2. For E. coli [2] data, an exponential fits somewhat better than a power law, but 
the data are statistically inconclusive. For the mammalian cell cycle genes, slightly 
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lower 7 has been extracted [8]. The average number of inputs varies with N and 7, 
and grows with decreasing 7. For very high 7, it is essentially 1. In Fig. 1, typical 
network realizations for N = 20 are shown for 7 = 1, 2 and 3, respectively. 



2.2 Canalyzing Boolean Rule Distributions 

In most studies of Boolean models of genetic networks, all Boolean rules have been 
employed [1]. In a previous paper, we introduced nested canalyzing rules and showed 
that it is possible to generate a distribution of such rules, that fits well with rule data 
from the literature [6]. 

A canalyzing rule is a Boolean rule with the property that one of its inputs alone 
can determine the output value, for either 'true' or 'false' input. This input value is 
referred to as the canalyzing value, the output value is the canalyzed value, and we 
refer to the rule as being canalyzing on this particular input. 

Nested canalyzing rules are a very natural subset of canalyzing rules, stemming from 
the question of what happens in the non-canalyzing case. That is, when the rule does 
not get the canalyzing value as input, but instead has to consult its other inputs. If 
the rule then is canalyzing on one of the remaining inputs, and again for that input's 
non-canalyzing value, and so on for all inputs, we call the rule nested canalyzing. All 
but six of the roughly 150 observed rules were nested canalyzing [6]. 

With I m and O m denoting the canalyzing and canalyzed values, respectively, and 
suitable renumbering of the inputs, ii, . . . ,ix, the output, o, of a nested canalyzing 
rule can be expressed on the form 

' Oi if z'i = h 



(2) 



02 if %\ 7^ I\ and ii = I2 

3 if ii Ii and i 2 ^ h and i 3 = I 3 

Ok if i\ 7^ h and • ■ ■ and %k-\ 7^ h<-\ and %k = Ik 

0defauit if k ^ h and • • • and i K ^ I K . 



For each value of K, we generate a distribution of if-input rules, with the inputs to 
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each rule taken from distinct nodes. All rules should depend on every input, and this 
dependency requirement is fulfilled if and only if Odefauit = not Ox- Then, it remains 
to choose values for Ji, . . . , Ik and 0±, . . . , Ok- These values are independently and 
randomly drawn with the probabilities 

p ( i m = tr ue) = m „ = true) = i ; x e p x ( p - 2 ;;:' a) (3) 

for m = 1, . . . , K, where a is a constant. Eq. El can be seen as a way to put a penalty 
both on outputting 'true' and on letting a 'true' input determine the output. More 
precisely: Let / be the fraction of 'true' outputs in the truth table, and let g be the 
fraction of input states such that the first input that has its canalyzing value is 'true'. 
Then, the probability to retrieve a specific rule is proportional to exp[— a(f + g)/2]. 

Our rule distribution fits observed data well [6], given that a = 7. For all generated 
distributions, we keep a = 7. A high value of a means a high penalty on active genes, 
while a = means equal probabilities for activity and inactivity. 

2.3 Robustness Calculations 

We wish to address the question of robustness in network models. In a stable sys- 
tem, small initial perturbations should not grow in time. In [6], this was probed by 
monitoring how the Hamming distance H between random initial states evolved in 
a "Derrida plot" [9]. Specifically, the slope in the low- if region shows the fate of a 
small perturbation after a single time step. This implicitly assumes that 'true' and 
'false' are equally probable in the initial states. 

Our chosen distribution of Boolean rules will favor 'false' node values. Preferably, 
a robustness measure should reflect the network properties in the vicinity of the 
equilibrium distribution, where the in- and out-degree distributions of 'true' and 
'false' are identical. See Appendix A. We therefore define the robustness for an 
ensemble of A r -node networks as the average effect, after a single time step, of a small 
perturbation at this equilibrium distribution. 

To compute r^r, we introduce the total sensitivity, S(R), of a given i^-input rule R. 
S(R) is the sum of the probabilities that a single flipped input will alter the output 
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of R. Thus, 



K 



s(r) = J2p[R(h,... 



, 0, ...,ik 



(4) 



^ R(i ly . . . ■ -,1k)] 



where the probability is calculated over the equilibrium distribution of input values, 
ii, . . . , %k- Then, r is given by r = (S(R)) where the average is taken over all rules in 
a given network (see Appendix A). This also allows us to calculate r when the rules 
are drawn from a distribution. Note that r is calculated without any assumption on 
how the inputs to the rules are chosen. This means that r stays the same for any 
output connectivity, and for any way to build a network containing a certain set of 
rules. In other words, r is a strictly local stability measure that is independent of 
whether the network is divided into some kind of clusters or not. 

Let Sk denote the average of S(R) over a distribution of if-input rules. The average 
robustness of a randomly chosen network with N nodes is then given by 

A? 



See Appendix A on calculation of Sk for nested canalyzing rules. With the nested 
canalyzing rule distribution, defined by Eq. EJ Sk < 1 for K — 2, 3, . . ., provided that 
q^O. 5*i is always 1, since every rule has to depend on all of its inputs, (a = yields 
Sk = 1 for all K.) This means that every network ensemble with the canalyzing rule 
distribution, and a ^ 0, that not solely consists of one-input nodes, is stable in the 
sense that r < 1. 

2.4 Number of Attractors 

Attractors in the Boolean model can be seen as distinct cell types [1] . It is, however, 
not straightforward to tell which attractors are biologically relevant. First, one can 
ask what cycle lengths are relevant. Second, the attractor basin sizes vary in a very 
broad range, and attractors with small attractor basins may be biologically irrelevant. 

The broad distribution of attractor basin sizes also means that the number of attrac- 
tors found by random sampling is strongly dependent on the number of samples [10]. 




(5) 



K=l 



7 



We choose to monitor the number of attractors, (Cl)n, of different periods, L, using 
exact methods (for the limit N — > oo) and full enumerations of the state space (for 
small networks). 

Given that r < 1, which means that the network is subcritical, the average number 
of attractors of a certain length, L, will approach a constant, (Cx)oo> as the system 
size, N, approaches infinity. We find an analytic expression for (Cx)^, and find that 
it is qualitatively consistent with results from a full state space search for N = 20. 

To investigate the limit N — > oo, we split up the robustness measure, r, into r c 
and r 1 , where r is the average number of outputs that, after one time step, will be 
affected by one flipped node. We define r c and r 1 as the numbers of nodes that copy 
respectively invert the state of the flipped node. This means, e.g., that r = r c + r l . 
For convenience, we define Ar = r c — r . 



(Cl)oo can be expressed as a function of r^ and Aroo for each L. For L = 1, 2, 3 we 
get 

{ 2)o ° = 2(l-r 00 )(l-[Ar 00 ]2) 1 ^ 

/r v + (Arpp) 3 ~ ^(Ar^) 3 

1 3; °° 3(l-r3 o )(l-Ar 00 )(l-[Ar 00 ]3) " ^ 
See Appendix B, where the derivations needed to calculate (Cx)oo are presented. 

The canalyzing rule distribution satisfies r c > r 1 , meaning that Ar > 0. This 
condition yields an increased number of fixed points and attractors of odd length, 
compared to the symmetric case Ar = 0. 

It is interesting to note that the limit of the total number of attractors 

00 

(C)oo = j^CWoo (9) 
L=l 

is convergent for r < 1/2 and divergent for r > 1/2. This transition occurs at 
7 = 1.376, with convergence below this value. See Supporting Text for details. 
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2.5 Tissue Simulations 



In multicellular organisms, we expect communication between cells to influence the 
network dynamics. In the real world, there exist several different types of intercellular 
signaling. Here we make an initial exploration using a simple model. Nevertheless, 
we think the results reflect some core properties. 

In our model, each cell communicates with its four nearest neighbors on a square lat- 
tice with periodic boundary conditions. All cells have the same genotype, and hence 
identical internal network architecture and rules. Each connection in the network rep- 
resents how a gene product influences the transcription of some gene. Some molecules 
or signals can cross cell membranes, and possibly diffuse far, but we only consider 
the case of local cell-to-cell signaling at the level of specific gene-gene interactions. 
Specifically, a fraction, k, of the connections are flagged as being intercellular, and 
for such connections the value is 'true' if any of the four neighbors has 'true'. 

For the overall robustness of such tissue networks, it is not sufficient to measure the 
robustness r, since r only depends on interactions during a single time step, during 
which a perturbation only can propagate to the nearest neighbor cells. Hence, we 
desire a multi-step robustness measure, which requires simulations, since it is outside 
the scope of our analysis. Rather than following trajectories from random initial 
states, we have chosen to identify fixed points, perturb these randomly by Hamming 
distance H(0) = 1, and then track the mean of H(t) for 20 time steps. In our 
simulations, we generate ensembles of networks using 5x5 lattices of cells, where 
each cell contains an identical network of N = 50 genes. 



3 Results and Discussion 

Three major findings emerge from Fig. 2, where the average robustness r and the 
fraction of active genes are shown as functions of 7 in Eq. ^ 

1. The dynamics of the networks is always stable, regardless of the power in the 
in-degree distribution. 
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2. The stability of the networks increases with the average number of inputs to 
the nodes. For flatter distributions, r approaches the critical value, r = 1. 

3. The average number of active genes decreases with increasing in-degree. 

Not unexpectedly, the number of attractors increases as the networks approach crit- 
icality, see Fig. 3. This increase is particularly rapid for long cycles. These results 
were obtained with analytical calculation and exhaustive enumeration of state space. 
Given the undersampling problems when simulating Boolean networks [10], the fea- 
sibility of analytical calculations is crucial for drawing the firm conclusions above. 
However, we did not attempt to extract the distribution of attractor basin sizes. In 
future research, it could be interesting to compare with the exact results for one-input 
networks in [11]. 

The results in Figs. 2 and 3 are shown for power law in-degree distributions. However, 
they are quite general. The corresponding curves for other distributions exhibit 
very similar behaviour, when the x-axis (7) has been transformed to appropriate 
parameters for other distributions. In all tested degree distributions, constant nodes 
(K = 0) are excluded. Recall that we also exclude rules that have redundant inputs. 
Thus, for low values of the average K, most of the rules will be copy- or invert- 
operators, which puts the network close to criticality. The strong stability for wide 
in-degree distributions, however, is a result of the canalyzing property of the rules, 
which makes forcing structures [1] prevalent. 

From analysis of network data from yeast [3] and the mammal cell cycle [8], there 
are indications that 7 decreases with the number of genes. Within the framework 
of our results, this means that, as the genome size increases, the networks get more 
stable. However, with increased number of genes, multicellularity becomes common. 
Including interactions between cells should make the overall system less stable. In- 
deed, when investigating this issue by simulations of a simplified tissue model, the 
stability decreases with interconnectivity between the cells, k, as can be seen from 
Fig. 4. 

We predict how the average number of active genes increases with 7. This may not be 
easy to verify, given that such a number will depend upon experimental conditions. 
It should be pointed out that the order of magnitude of active genes is set by the 
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rule generator in Eq. El which is derived from from fitting to the observed rules in [5] , 
many of which originate from Drosophila melanogaster. The fitted parameter a sets 
the scale of the fraction of active genes, with high a corresponding to low gene activity 
and vice versa. The qualitative behavior is, however, rather insensitive to the value 
of a. 

In summary, we have designed and analyzed a class of Boolean genetic network 
models consistent with observed interaction rules. The emerging ensemble properties 
do not only exhibit remarkable stability for the dynamics, but also several generic 
properties that make predictions, such as how stability varies with genome size, and 
how the number of active genes depends on the in-degree distribution. Since the 
single-cell calculations are performed analytically, the results are transparent in terms 
of understanding the underlying dynamics. 
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Appendix A — Robustness 



The stability measure, r, expresses the average number of perturbed nodes one time 
step after perturbing one node, given that the network has reached equilibrium in a 
mean field sense. Both the mean field equilibrium distribution (of 'true' and 'false') 
and r are closely related to attractors in the true, non-mean field dynamics. See 
Appendix B. 

Let W(w) denote the probability that a randomly chosen rule will output 'true', given 
that the input values are randomly and independently chosen with probability w to 
be 'true'. Let Wk(w) denote W (w) when the selected rule has K inputs. Then, the 
equilibrium probability for an iV-node network, , satisfies 

N 

< = W«) = P%Wk(w») . (10) 

K=l 

Eq. El can be solved numerically for nested canalyzing rules, taking advantage of the 
fast (exponential) convergence of Wk(w) as K — > oo and using standard (integration- 
based) methods to calculate the sum in the case that N — > oo. Note that is only 
referring to the mean field equilibrium distribution, which is essentially the same as, 
but not identical to, the distribution of 'true' and 'false' after many time steps in a 
simulation. 

For nested canalyzing rules, Wk{w) is given by 

K 

W K (w) = Pi C ■ ■ ■ P^-iPmOk = true) 

fc=l (.11) 

+ pnc...pncp (0dcfauit = tme) 

where P£ an = P{ik = h) and P£ c = P{ik ^ h)- The input values, i±, . . . are 
'true' with probability w, while the corresponding probabilities for Ii,...,Ik and 
Oi, . . . , Ok are given by Eq. El 

Let r(R) denote the probability that the rule R is dependent on a randomly picked 
input, given that the other inputs are independently set to 'true' with probability 
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w^ q . We can express r for a specific iV-node network with rules Ri, . . . , as 



N N 
1 j=i 



where K(Ri) is the number of inputs to Ri. We have defined S(R) = K(R)r(R), so 
that r is the average of S(R) over all rules in the network. This is also valid for a 
distribution of networks, meaning that 

N 

tn = Y,PkSk (13) 

K=l 

for iV-node networks, where Sk is the average of S(R) when random ZT-input rules 
are chosen. 

Eq. is exact, given that the state of the network is randomly picked from the 
mean field equilibrium distribution of 'true' and 'false'. Since the derivations are 
completely independent of the specific network architecture, this result holds for any 
procedure to generate architectures, as long as the average fraction of nodes with K 
inputs are given by p 1 ^ (over many network realizations). 

For nested canalyzing rules, we can calculate as a sum of probabilities. If the rule 
is canalyzing on input k, and changing i k makes the rule canalyze on input j, there 
is some probability that the output value changes. The case that the rule falls back 
to Odefauit corresponds to the last term in the square brackets. 



K-l 

C _ \ r pnc pnc 

k=l 



K 



i=k+l (14) 
+ ' ' " Pl(P{Ok 7^ ^default) 

where P fe can = P(i k = I k ) and = P(i k ^ I k ). 

Let v and V(v,w) denote the fraction of input and output values, respectively, that 
are toggled (from 'false' to 'true' or vice versa) when going one step forward in time, 
given that the fraction 'true' input values is w both before and after the input is 
toggled. Then, V(0, w) = 0, since constant input renders constant output. A small 
change in v will change the output with r times the change v. This means that 
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d v V(v, w cq ) < r, where the inequality comes from the possibility that new changes 
undo previous ones, as v is increasing. Combining these relations yields 



which means that, for r < 1, V(v, w cq ) < v with equality if and only if v = 0. Hence, 
frozen states, where the fraction of 'true' nodes is u> eq , are the only solutions to the 
mean-field dynamics, given that r < 1, which is true for the nested canalyzing rule 
distribution. 

Note that v can be seen as the distance, i.e., fraction of differing states, between 
two separate time series. Then, the mapping v i— > V(v,w DCl ) gives the evolution of 
the distance during one time step. Similar calculations have been carried out in 
e.g., [12,13], yielding results consistent with Eq. [T3J 



To calculate the average number of attractors, we use the same approach as in [10]. 
This approach means that we first transform the problem of finding an L-cycle to a 
fixed point problem, and then find a mathematical expression for the average number 
of solutions to that problem. 

Assume that a Boolean network performs an L-cycle. Then, each node performs one 
of 2 L series of output values. We call these L-cycle series. Consider what a rule does 
when it is subjected to such L-cycle series on the inputs. It performs some Boolean 
operation, but it also delays the output, giving a one step difference in phase for the 
output L-cycle series. If we view each L-cycle series as a state, an L-cycle turns into 
a fixed point (in this enlarged state space). L(Cl)n is then the average number of 
input states (choices of L-cycle series), for the whole network, such that the output 
is the same as the input. 

Let Q denote a specific choice of L-cycle series for the network, and let QJl be the 
set of all Q such that Q is a proper L-cycle. A proper L-cycle has no period shorter 




(15) 



Appendix B — Attractors 
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than L. The average number of L-cycles is then given by 

(C L )N = \Y. P L^) ( 16 ) 

where P/ V (Q) denotes the probability that the output of the network is the same as 
the input Q. 

Since the inputs to each i^-input rule are chosen from a flat distribution over all nodes, 
P/ V (Q) is invariant with respect to permutations of the nodes. Let n = (no, • • • , n m _i) 
denote the number of nodes expressing each of the m = 2 L series. For rij, we refer 
to i as the index of the considered L-cycle series. For convenience, let the constant 
series of 'false' and 'true' have indices and 1, respectively. Then, 

w»4e (TK«) < i7 > 



nev? 



where ( ) denotes the multinomial N\ / (uqI ■ ■ ■ ra m _i!) and is the set of partitions 
n of N such that Q G Q^. That is, n represents a proper L-cycle. 

Let A l L (x.) denote the probability that a randomly selected rule will output L-cycle se- 
ries i, given that the input series are selected from the distribution x = (x , . . . , x m _i). 
Since a node may not be used more than once as an input to a specific rule, A l L (x) 
should also depend on N. However, the difference between allowing or not allow- 
ing coinciding inputs vanishes as N goes to infinity, since the output is effectively 
determined by relatively few inputs for nested canalyzing rules. 

Since these calculations aim to reveal the asymptotic behavior as N — > oo, we allow 
for coinciding inputs in the following. Then, we get 



p l (q) = n l^/N 



0<i<m 



By combining Eqs. IT7I and ITH1 and applying Stirling's formula to ( n ), we get 



4l ^" e"^™ (19) 



0<«<m 
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where 



/i( x ) = Y Xiln 



0<i<m 



(20) 



See Supporting Text regarding the use of Stirling's formula. 

Equation I2TH can be seen as an average (ln[iA^(x)]) j with weights xo, 
Hence, the concavity of x i— ► In x yields 

1 



111 



-^l(x) 



1 



<ln( -Ai(x] 



In £ Ai(x) < 

0<i<m 



(21) 



with equality if and only if 

x = A L (x) (22) 

where A L (x) = (x), . . . , A™ _1 (x)) . Eq.l22lcan be seen as a criterion for mean field 
equilibrium in the distribution of L-cycle series. This makes it possible to connect 
quantities observed in mean field calculations to the full non-mean field dynamics. 

Since Nfi(x) occurs in the exponent in Eq. El the relevant contributions to the sum 
in Eq.^Jmust come from surroundings to points where Eq. |221is satisfied as N — * oo. 
(/z,(x) is not a continuous function, but obeys the weaker relation "For all x, in the 
definition set of fi, and e > 0, there is a 5 > such that /l(x') < /l(x) +e holds for 
all x' satisfying |x' — x| < 5 in the definition set of /x,." which is a sufficient criterion 
in this case.) Eq. ITUl provides an upper bound to (Cl)n, since the approximation 
of (^) is an overestimation, see Supporting Text. Hence, the relevant regions in the 
exact sum, Eq. H3 surround solutions to Eq. E21 

Given that Eq. E21 holds, the fraction of 'true' values, itf(x), and the fraction of 
togglings u(x), in the distribution of L-cycle series, should be consistent with the 
mean field dynamics, meaning that w(x) = w c<i and u(x) = 0. (See Appendix A.) 
This means that a typical attractor, for large N, has only a small fraction (which 
approaches for iV — > oo) of active (non-constant) nodes. 

For large N, we want to investigate the number of attractors for certain numbers 
of active nodes. Hence, we divide the summation in Eq. El into constant and non- 
constant patterns. In order to give the sum a form that can be split in a convenient 
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way, we introduce the quantity (Ql)ni which we define as the average number of 
states that are part of a cycle with a period that divides L. See Supporting Text on 
how to express (Cl)n m terms of (Ql)n- As N — > oo, the summation over constant 
patterns has a limit that yields 



w~ = £ n • 

neN™- 2 2<i<m 1 

where n = (n 2 , . . . , n m -i), and N denotes the set of non-negative integers. See 
Supporting Text. 

The elements djA l L of the gradient VA l L are nonzero only if the L-cycle series i can 
be the output of a node which has only one non-constant input retrieving the L-cycle 
series j. This is true if the series j is the series i, or the inverse of i, rotated one step 
backwards in time. Let and 4>\(i) denote those values of j, respectively. Then, 

n-VAl = r c % c {t) +r\ l{i) . (24) 

Equation EU yields that the sum in Eq. I2B1 factorizes into subspaces, spanned by 
sets of L-cycle series indices of the type [i, 4>2(i), (p l L (i), (p^ ° 0^(0, 4>l ° ^lOO> • • •} 
containing all possible results of repeatedly applying 0^ an d 4>\ to z - We call those 
sets invariant sets of L-cycle series, which is the same as invariant sets of L-cycle 
patterns in [10], but formulated with respect to L-cycle series instead of L-cycle 
patterns. Let p° L , . . . , p^" 1 denote the invariant sets of L-cycle series, where Hl is 
the number of such sets. For convenience, let p° L be the invariant set {0, 1}. 

Consider an invariant set of L-cycle series, p\. Let I be the length of p\, meaning 
that I is the lowest number such that, for % G p\, (<f>z,Y(i) is either % or the index of 
series i inverted. If {^>l) v) = h we sa Y that the parity of p\ is positive. Otherwise 
the parity is negative. The structure of an invariant set of L-cycle series is fully 
determined by its length and its parity. Such a set can be enumerated on the form 
{(j>2(i), . . . , . . . ,0i o and = % for positive parity 

while (j) l L o (i) = i for negative parity. 

Let strings of T and F denote specific L-cycle series. Then <^(FFFT) = FFTF 
and 0^(FFFT) = TTFT. Examples of invariant sets of 4-cycle series are {FFFT, 
FFTF, FTFF, TFFF, TTTF, TTFT, TFTT, FTTT} and {FTFT, TFTF}. The first 
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example has length 4 and positive parity, while the second has length 1 and negative 
parity. 



Let vf, z/f, . . . , vf, denote the numbers, rii, of occurrences of L-cycle series belong- 
ing to p\, in such a way that v. = n,i <^> vf_ x = n^c^ and vf = <^ vf_ x = n^i ^. 
For convenience, we introduce vf as a renaming of vf. There are two ways that 
vf can be connected to vf: either vf = vf (positive parity) or vf = vf (negative 
parity). 

Each invariant set of L-cycle series, p\, contributes to Eq. ESI with a factor 

9(P h L ) = gf= £ ^ 

where 

G U X U ^ = exp(-^)^f eX p(-^)^2!!_ (26 ) 

^r^ + r 1 ^ , (27) 

and vf = vf for gf while ^ = ^ for gj^- Eq. |2E]is interpreted with the convention 
that 0° = 1 to handle the case where vf or v7 =0. 

Although the right hand side in Eq. EE] looks nasty, it can be calculated yielding the 
expression 

± 1 1 
9i ~ i- r n T ( Ar)< ■ (28) 

See Supporting Text. 

Now, we can write Eq. E31 as 

, H L -1 

Woo = YZTa^ II 9(A) (29) 

h=l 

where gip^) is calculated according to Eq. EB1 The period, £, and the parity, + or — , 
of a given invariant set of L-cycle series can be extracted by enumerating all L-cycle 
series. This provides a method to calculate (^l)oo for small L. See Supporting Text 
on how to calculate oo in an efficient way. 
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Figure captions 



Fig. 1. Examples of N = 20 networks with 7 = 1 (a), 7 = 2 (b) and 7 = 3 (c). 

Fig. 2. Robustness, r, and the probability of a node being 'true', w, at the equilib- 
rium distribution of Boolean values, as functions of the exponent 7 in the in-degree 
distribution p K (Eq. [TJ) for N = 20 (dotted), 100 (dashed) and 00 (solid). 

Fig. 3. The number of attractors as function of the exponent 7 in the in-degree 
distribution px (Eq. QJ) for N = 20 (thick lines) and N — > 00 (thin lines). The 
curves show the cumulative number of attractors of length L for L = 1 (solid), L < 2 
(dashed) and L < 00 (dotted). The values for N — > 00 were calculated analytically, 
whereas the values for iV = 20 are taken from full enumerations of the state space 
for at least 5000 networks, with more networks at higher 7. 

Fig. 4. The average time evolution of perturbed fixed points in 5 x 5 cell tissues with 
periodic boundary conditions and N = 50 nodes (over many network realizations). 
Simulations from random initial states in generated networks were used to locate 
fixed points, which were perturbed by toggling the value of a single node. The 
mean distances to the unperturbed fixed point, (H(t)), as given by 20 subsequent 
simulation steps, is shown for 7 = 2 (a) and 7 = 3 (b), for three different degrees of 
cell connectivity: k = (solid), 0.05 (dashed) and 0.1 (dotted). 
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